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METHOD FOR FAST 
NETWORK-AWARE CLUSTERING 



5 This application is a continuation-in-part of U.S. Patent Application No 09/603,154 filed 

on July 23, 2000 which claims the benefit of United States provisional application 
No. 60/151,194, filed on August 27, 1999, the contents and disclosure of which are fully 
incorporated herein by reference. 

This application also claims the benefit of United States provisional application 
10 No. 60/215,302, filed on June 30, 2000 and United States provisional application 

No. 60/234,51 1, filed on September 22, 2000, the contents and disclosure of which are fiilly 
incorporated herein by reference. 

Jj BACKGROUND OF THE INVENTION 

111 

ill This invention relates to a method of grouping or clustering clients, servers and/or other 

' \^ 

I f I entities within a network to optimize and expedite the flow, transfer, redirection and/or 

redistribution of data and information within the network and more particularly, to a method for 

\^ fast network aware or on-line clustering which uses a radix-encoded trie process to perform 

1 2b longest prefix matching on one or more chent and/or server network IP addresses in order to 

^3 properly cluster the clients and/or server into proper clusters. 

Servers, such as proxy servers, cache servers, content distribution servers, mirror servers 
and other related servers are typically used to speed the access of data and reduce response time 

25 for network client requests in a network, such as the World Wide Web. Generally, these network 
clients issue requests for information, such as in the form of a Hypertext Transfer Protocol 
(HTTP) requests for some information, such as one or more Web pages. These requests are then 
handled directly or indirectly by these servers, such as proxy servers, caches servers, content 
distribution servers and mirror servers, to hopefiilly expedite the accessing and transfer of the 

30 requested information. 

Generally, these servers either act as intermediaries or as transfer or redirection points for 
client requests in the network. For example, in operation, a proxy server receives a request for 
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an Internet service (such as a Web page request) from a user. If the request passes filtering 
requirements, the proxy server looks in its local cache of previously downloaded Web pages. If 
the server finds the page, the page is returned to the user without needing to forward the request 
to, for example, a World Wide Web server on the Internet, If the page is not in the cache, the 
proxy server, acting as a cUent on behalf of the user, requests the page from the server out on the 
Internet. When the page is returned, the proxy server relates it to the original request and 
forwards it on to the cUent user. 

Strategically designing placement of proxies in the network can benefit greatly from 
clustering network client users who are from the same network together so that the proxy server 
can adequately and efficiently serve these respective client clusters. Mis-characterizing clients 
as being in the same network may result in a proxy server being placed such that it impracticably 
and inefficiently serves these cKents resulting in degraded performance in the network. 

In the case of, for example, a cache or a content distribution server, the user's HTTP 
request at an originating server is typically re-routed away from the originating server and on to a 
cache server "closer" to the user. Generally, the cache server determines what content in the 
request exists in the cache, serves that content, and retrieves any non-cached content from the 
originating server. Any new content may also be cached locally in the cache server. 

Similar to the strategic placement of proxies, the placement of cache servers, content 
distribution "boxes" or servers and related mirror servers can be best made by accurately 
clustering cUents together in the network. Performance in the network may thus be improved by 
accurately and properly clustering multiple network cUents together in related client clusters. 
The servers, whether they are cache servers, content distribution servers and/or mirror servers 
can then efficiently service these client clusters. 

Knowledge of these network clusters, such as identifying certain "busy" clusters from 
which a certain level of network traffic originates can be used in a variety of different 
apphcations. For example, a busy Web site may want to provide tailored responses and/or 
QuaUty of Service differentiation based on the origin of requests to the Web site. Web sites 
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and/or server may also be able to dynamically perform automatic user request re-direction where 
needed in the network based on clustering information. However, such information needs to 
captured in an efficient, expedited and real-time basis without any undue lag time which may be 
experienced by the Web site requester. 

5 

Accordingly, it would be desirable to have a method for accurately clustering clients, 
servers and other entities within a network together to guide placement of proxies, cache servers, 
content distribution servers and mirror servers within the network. It would also be desirable to 
have a method for fast on-line clustering which may be used in apphcations such as content 
10 distribution, proxy positioning, server repUcation and network management. 
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SUMMARY OF THE INVENTION 

The present invention is a method for guiding placement of servers, such as proxy 
servers, cache servers, content distribution servers and mirror servers within a distributed 
5 information network. The method uses information from at least one network log, such as a 
server log or proxy log, and at least one network routing table to arrange clients into related 
chent clusters. In one embodiment, the method includes the steps of generating a unified 
prefix/netmask table from a plurality of extracted prefix/netmask entries, extracting a plurality of 
chent BP addresses from the at least one network log, comparing each of the plurality of chent IP 
1 0 addresses with entries in the unified/prefix netmask table to determine a common longest prefix 
matching between each of the plurality of chent IP addresses and the entries in the unified/prefix 
^ 3 netmask table and grouping all of the chent IP addresses which share the common longest prefix 
5 matching into at least one cUent cluster. Each chent within a client cluster will share a common 
'd network address prefix from the unified routing table with the other chents in the same chent 
% cluster. 

Preferably, a number of different routing table snapshots are used in extracting entries for 
\^ the unified prefix/netmask table. These raultiple entries from the different prefix/netmask tables 
are unified into a singular format and then merged into a single table. 

;lf) 

Network servers, such as proxy servers, cache servers, content distribution servers and 
mirror servers may be assigned to one or more clusters based on a number of factors such as the 
number of cUents within the cluster, the number of requests issued, the URLs accessed and the 
number of bytes fetched from a server, such as a Web server. 

25 

The present invention is also a method for on-line network-aware clustering. In one 
embodiment, on-line network aware clustering includes extracting client IP addresses, 
performing longest prefix matching on each client IP address and classifying all the client IP 
address that have the same longest matched prefix into a cUent cluster, wherein the longest prefix 
30 matching is performed in accordance with a radix-encoded trie process. In other embodiments, 
the on-line network aware clustering may be performed to detect server clusters, instead of client 
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clusters, in the network. In such an embodiment, on-line network aware clustering for server 
clusters includes extracting server IP addresses from one or more proxy logs, performing longest 
prefix matching on each server IP address and classifying all the server IP addresses that have 
the same longest matched prefix into a server cluster, wherein the longest prefix matching is 
performed in accordance with a radix-encoded trie process. In accordance with the teaching of 
the present invention, network-aware clustering may also be used to perform server repHcation or 
other related network application. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates an exemplary network configuration in accordance with the teachings of the 
present invention. 

5 

FIG. 2 illustrates an exemplary method in accordance with the teachings of the present invention. 

FIG. 3 illustrates an exemplary routing table containing routing information. 

10 FIG. 4 illustrates an exemplary method for creating a unified routing table in accordance with the 
teachings of the present invention. 



I ri FIG. 6 ilUistrates an exemplary method for chistering clients in accordance with the teachings of 

the present invention. 

i!l FIG. 7 illustrates an exemplary method for network aware clustering in accordance with the 
% teachings of the present invention. 

FIG. 8 illustrates an exemplary radix encoded trie structure in accordance with the teachings of 
the present invention. 

25 FIG. 9a illustrates an exemplary code implementation of a radix encoded trie in accordance with 
the teachings of the present invention. 

FIG. 9b illustrates another exemplary code implementation of a radix encoded trie in accordance 
with the teachings of the present invention, 

30 



FIG. 5 illustrates an exemplary table containing routing information and unified routing 



information. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates to a method for clustering clients and allocating servers, 
such as proxy servers, cache servers, content distribution servers and mirror servers, to those 
cUent clusters in a distributed information network, such as the Worid Wide Web. The present 
invention also relates to a method for clustering clients and servers in a distributed information 
network to aid in engineering and shaping traffic within the network, such as may be done in, for 
example, a content distiibution appUcation. The present mvention includes methods for both off- 
line and on-line or fast network aware clustering of both cUents and servers within the network, 
on-line clustering methods preferably performed according to a radix-encoded tiie or retiie as 
discussed in more detail later herein. 

Referring to FIG. 1, an exemplary network 10 configured in accordance with the 
teachings of the present invention is shown. The network 10 includes a number of clients, such 
as clients 20 (CI, C2.. ..Cn) which are clustered together in a client cluster 30, cUents 40 (CAl, 

CA2, CA3 CAn) which are clustered together in a client cluster 50 and clients 60 (CBl, 

CB2. . ..CBn) which are clustered together in a client cluster 70. Chent cluster 30 is in 
communication with servers 32, 34 which together form a server cluster 36, chent cluster 50 is in 
communication with servers 52, 54 and 56 which form a server cluster 58 and cUent cluster 70 is 
in communication with a single server 72. In the present invention, servers 32, 34, 52, 54, 56 
and 72 may be any one of proxy servers, cache servers, content distribution servers and/or mirror 
servers. For example, server 32 and server 34 may be proxy servers such that server cluster 36 is 
a proxy server cluster. 

Server cluster 36 including servers 32 and 34, server cluster 58 including servers 54, 56 
and 58 and server 72 are in further communication with a server, such as a World Wide Web 
server 90. World Wide Web server may be any server available on the Internet which is 
responsive to requests to and from any one of the cUents and/or servers. For example, Worid 
Wide Web server may be a server which receives and responds to requests for Web pages related 
to one or more Web sites which are resident on the server. Other network configurations are 
possible provided the network servers, such as the network proxy servers, cache servers, content 
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distribution servers and mirror servers are allocated to properly clustered client clusters as 
discussed in more detail later herein. 

In the present invention, the placement and configuration of the servers and server 
5 clusters, such as the proxy servers and related proxy server clusters, cache servers and related 
cache server clusters, content distribution servers and content distribution server clusters and 
mirror servers and related mirror server clusters depends on the clustering of clients within the 
network. For example, using a content caching scheme for illustrative purposes, a client may 
issue a request for content, such as HTTP request to a World Wide Web server. This may be 

10 performed by the Web client clicking on a URL that is "content-deUvery enabled", i.e. the URL 
includes the use of a special routing code that redirects the Web page request to the optimum or 

^1 "closest" server. This "content-delivery enabled" URL will re-route that cHent's request away 
from the site's originating Web server and on to a cache server or cache server chister, that is 

11 better suited to serve the cUent. 
% 

\n Referring to FIG. 2, an exemplary embodiment of a method for clustering cUents and 

assigning or allocating servers to these client clusters is shown. In this embodiment, a unified 
\^ routing information table is created, step 1 10. The unified routing information table, preferably 
i J includes routing information from one or more routing tables, such as network routing prefix and 
I i netmask information. For background purposes, a netmask is a series of bits designed to "mask" 
or conceal certain portions of an IP address. Typically, the standard netmask for a class C 
network like is 255.255.255,0 where the "255.255.255" prefix portion identifies the network 
number and the last octet, ".0", is the actual machine number or subnetwork number. Referring 
again to FIG. 2, cUents within the network are classified into cUent clusters based on information 
25 from the unified routing information table, step 120. Servers, such as proxy servers, cache 
servers, network distribution servers and mirror servers may then be assigned to these client 
clusters, step 130, as discussed in more detail later herein. 

As shown in FIG. 2, the present invention utilizes routing table information from, 
30 preferably, two or more routing tables to create a unified routing information table. For 
background purposes, a router is a device or, in some cases, software in a computer, that 
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determines the next network point to which a packet should be forwarded toward its destination. 
The router decides which way to send each information packet based on its current 
understanding of the state of the networks it is connected to. Typically, routers create or 
maintain a routing table of the available routes and their conditions and uses this information 

5 along with distance and cost algorithms to determine the best route for a given packet. An 
exemplary routing table or routing table "snapshot" 160 is shown in FIG. 3. As shown, the 
routing table or routing table snapshot 160 may include information such network prefix and 
netmask information 170, network identification information 180 and AS path information 190. 
Other additional information such as next hop IP address and AS number, etc may also be 

10 available via the network routing table. Li an exemplary embodiment, as discussed above, 

information from a plurality of routing tables are used to create the unified routing table for use 

•J in chistering clients together. 

hi Referring to FIG. 4, an exemplary method for creating a unified routing table is shown. 

% To create the unified table, a number of prefix/netmask entries are extracted fh)m a number of 
ill routing tables or routing table snapshots, step 200. Although only a single routing table or 
, routing table snapshot may be used, preferably two or more routing tables or routing table 
snapshots are used since any one table is unlikely to contain the desired information on all the 
I necessary prefix/netmask entries. Each router in a network such as the Worid Wide Web will 
So typically only see a limited set of traffic, thereby it is desirable to use a multiphcity of different 
routing tables from different routers in order to obtain a more complete set of routing 
information. The prefix/netmask entries from the various tables are unified into a single 
standardized format, step 210, as discussed in more detail later herein. The standardized 
prefix/netmask entries are then merged into a single unified table, step 220, to aid in clustering 
25 together clients in the network. Typically, the unified routing table will be created periodically 
to incorporate possibly updated information from the routing tables in the network. The unified 
routing table may be created or generated at any interval such as every two hours, once a month 
or ten times a year as desired. 

30 Referring to FIG. 5, a network prefix/netmask entry may be in one of three formats as 

shown in a tabular form. A first exemplary format 230 is configured generally as 
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Xl.x2.x3.x4/kl.k2.k3.k4 and is used in exemplary routing tables MAE-EAST, MAE-WEST, 
PACBELL AND PAIX, where xl.x2.x3.x4/kl.k2.k3.k4 are network prefix and netmask 
respectively with zeroes dropped at the end or tail. One such example is 193.1/255.255, which 
corresponds to 193.1.0.0/255.255.0.0, where 193.1.0.0 and 255.255.0.0 are network prefix and 
5 netmask, respectively. 

Referring again to FIG. 5, another exemplary network prefix/netmask entry format 240 
may also be configured as xl.x2.x3.x4/l as in routing tables at ARIN, AT&T, CANET, NLANR 
AND VBNS, where xl.x2.x3.x4 is the prefix and 1 is the netmask length. For example, 
10 128.148.0.0/16 stands for 128.148.0.0/255.255.0.0, where 128.148.0.0 and 255.255.0.0 are 

network prefix and netmask. Additionally, another exemplary prefix/netmask entry format may 
'3 be configured as xl .x2.x3.0 which can also be found in CANET, and is an abbreviated 
5 representation of xl.x2.x3.0/kl.k2.k3.0. For example, 130.15.0.0 is an abbreviated 
; J representation of 130.15.0.0/255.255.0.0. Of course, other formats may exist and may be 
ill utilized herein provided the different formats are standardized to a singular format to aid in 
i n clustering clients in the network. 

\^ In the present invention, the network prefix/netmask entries are unified into a single 

1 75 standardized format as previously discussed herein and shown as step 210 in FIG. 4. Any one of 

9o the formats as discussed above or other network prefix/netmask formats that may exist will 

preferably be converted into this single standardized format. In one exemplary embodiment, the 
format xLx2.x3.x4/kLk2.k3.k4 is chosen as the standardized format. For instance, any network 
prefix/netmask entries in the format of xl.x2.x3.x4/l and/or the format xl.x2,x3.0 will be 
converted into the format xLx2.x3.x4/kLk2.k3.k4 such that prefix/netmask entries 
25 128.148.0.0/16, 130.15.0.0 and 192.75.72.0 will be converted respectively into 128.148/255.255, 
130.15/255.255 and 192.75.72/255.255.255. These converted prefix/netmask entries are then 
tabulated into a single unified table with prefix/netmask entries existing in all the same fomiat. 
This table may be in a simple tabular form with the multiple prefix/netmask entries listed in a 
grid array form. 

30 
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Referring now to FIG. 6, an exemplary method for clustering clients using the unified 
prefix/netmask table is shown. A number of client IP addresses are fu-st extracted from a 
network server log, step 600. The server log may be any generally available server log, such as 
a World Wide Web server log which collects cUent request information to the Web server. The 
extracted client IP addresses are matched with the entries in the unified prefix/netmask table to 
determine a common network address prefix, step 610. Such matching may be performed by 
conducting a longest prefix matching on each chent IP address with each of the entries in the 
unified prefix/netmask table. Once prefix matching has been performed, the client IP addresses 
are clustered into respective client clusters, step 620. hi each respective client cluster, each of 
the clients within a client cluster will share a common prefix, or more specifically, a common 
longest prefix matching from the unified prefix/netmask table. A threshold for prefix matching 
may be set such that a client P address has to have at least a certain number of matching digits in 
the client IP address prefix with any one of the prefix/netmask entries in the unified 
prefix/netmask table before a match is declared. For example, in one embodiment, a client IP 
address may have to prefix match at least four digits of any one of the prefix/netmask entries in 
the unified prefix/netmask table to be considered a match. 

Once clients have been clustered together in chent clusters as discussed above, servers, 
such as proxy servers, cache servers, content distribution servers and/or mirror servers may be 
placed or assigned to these client clusters. Preferably, in the case of proxy servers, the proxy 
servers being assigned to these client clusters will be fimctioning as cache servers and thereby 
their optimum assignment or placement will depend greatly on the proper clustering of these 
clients. In the present invention, the servers may be assigned to these chent clusters based on 
one or more factors or metrics such as the number of clients, the number of requests issued, the 
URLs accessed, the number of bytes fetched from a server and other related factors. In one 
embodiment, more than one server, such as a proxy server, cache server, content distribution 
server and/or mirror server may be assigned to the same client cluster or clusters such that the 
servers will together form a server cluster, as discussed earUer herein. The servers within a 
server cluster will act in concert with one another to service their respective client cluster(s). 
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In one example, proxy servers, cache servers, content distribution servers and/or mirror 
servers may be assigned to a client cluster based simply on the number of clients in the 
respective client cluster. For example, a cUent number threshold may be set such that a server 
may be assigned for every instance of the client number threshold reached. A threshold may be 
set at any number such as 100, 500 or a 1000 clients. Li an exemplary embodiment, where the 
cHent number threshold is set at 500, a chent cluster containing 4000 clients will require and be 
assigned at least eight (8) servers, whether they may be proxy servers, cache servers, content 
distribution servers and/or mirror servers. These eight servers together will form a server cluster 
which will be placed in front of the cUent cluster in the network to service the clients requests to 
and from the Internet. 

In another example, proxy servers, cache servers and/or content distribution servers may 
be assigned to one or more cHent clusters based on the number of requests, such as HTTP 
request, which are issued by the clients within each respective client cluster. For example, a 
request threshold may be set such that a server may be assigned for chent clusters which issue a 
certain number of requests which equals or exceeds the threshold. A threshold may be set at any 
number depending on the anticipated capacity of the server to be assigned to the client cluster. 

In another exemplary embodiment, at least one server, such as a proxy server, cache 
server, content distribution server and/or mirror server, may be placed in front of each chent 
cluster. The servers may be further grouped into server clusters based on their respective 
Autonomous System (AS) numbers and respective geographical locations. In this example, all 
servers belonging to the same AS and located geographically nearby will be grouped together to 
form a server cluster. In addition, in some instances, undesirable network spiders and conflicting 
proxy servers are ehminated from a client cluster before placing a server, such as a proxy server, 
cache server, content distribution server and/or mirror server to server that client cluster. 

Referring to FIG. 7, an embodiment for clustering, or more specifically, for on-line 
network aware clustering is shown. In this embodiment, a plurality of client IP addresses are 
extracted, step 700. Longest prefix matching is then performed on each cUent IP address 
according to a data structure or radix-encoded trie, step 710. Once long prefix matching is 
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performed on each client IP address, all the client IP address that have the same longest matched 
prefix are classified or grouped into one cHent cluster, step 720. In this embodiment, to cluster a 
set of IP addresses in an on-line network aware manner, a recursively structured table or data 
structure, call a radix-encoded trie or retrie is used. 

5 

As used herein, the radix-encoded trie or retrie is a basically a table indexed on some 
high-order bits of a given key. Each entry points to another table, indexed by some of the 
following bits, and so on. For background purposes, an IP address is, e.g., in IPv4, a 32-bit 
integer, and an IP prefix p, is an IP address associated with a length ](pj ^ [0,32]. Prefix p 
10 matches address a if ^ & ((2 ^^^^ — 1) «(32 _ ](p))) = P, where & is bit-wise AND and «is left 
shift. Thus, given a collection of K-bit keys, consider a top-level retrie, p^ indexed on the k niost 
3 significant bits of a key. ^ is a table of size 2^. Let » indicate right shift. Given key x, 

^1 element pfx » (K k)] poi^its to a second-level retrie, p \ indexed on, say, the next / bits. The 

t element of ^ - corresponding to ;c is (K—(k + /))) & (2' — 1)1; and so on. That is, each 

% retrie has a shift value (k— k'^r^ the top level, K—(k + l)'^'^ the second level in this example) 
;i and a mask value (2"^ _ 1 in the top level, 2' _ 1 in the second level); the top-level mask is 
J. superfluous. The shift and corresponding mask values among the lower-level retries need not be 
identical. 

i) As used herein, the retrie may be completely described by a structure containing pointers 

to the top-level table and shift and mask values. Standard memory alignment of pointers may be 
used to search for key x in retrie r as follows. 



while(! ((r=r— >tablel(x»r— >shift)&r— >mask] )&1)) 



When the loop exits, the upper 3 1 bits of r point to the data record for x. To build a retrie for a 
set S of IP prefixes, a binary search tree 7 describing the induced address ranges is first built. 
Consider prefix V = bl ... b32, where the b~s are the bits of p in msb-to-lsb order, and define the 
addresses iow(p) = b]--b i(p) 0-0 and high(p) = bi(p).}... 1 where O -O corresponds to 32-/(p) and 
30 1...1 corresponds to 32-/(p). Prefix P covers addresses in the range [low(p). high(p)]- 
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Initially, 7 contains one key, 0, describing the range [0,232 _ i]. We insert each prefix p 
in s, in non-decreasing order by length, into 7 as follows. Find the predecessor, x, of p in j: By 
induction, the properties of prefixes and the ordering of prefixes by length imply that the range 
of p is contained in the range [x^ y), associated by invariant with x, where y is the successor of x 
5 in T. Insert iow(p) and high(p) + 1 into 7; associating range fhwfp), high(p)J with low(p). The 
remainder of the original range, [x,y), associated with x is split into ranges [x, low(p)), associated 
with X, and fhighfp) + 1, V)' associated with high(p) + 1 • After construction, an LPM query on 
an address x could be performed by a predecessor query of x in x. 

10 Consider a retrie to be built on some range r = [x, y) (initially [0, 00)) of addresses, and 

assume a global threshold parameter f. The elements of 7 within r correspond to the matching 
'5 prefixes within r^ The shift value 5 and mask value ^ are in one-to-one correspondence. Let i 
Q be the length of the longest prefix within r Ideally, 5 is set so that 32 . s = L, i-©-' so that the 
! = retrie fully quaUfies each prefix within ^, If the corresponding ^ exceeds 2t - 1 , however~if 
% the table would be too big— then ^ is set to 2t -1 and ^ is set accordingly, resulting in lower level 
i Jl retries. The table is then populated using the elements of j-to map IP addresses to corresponding 
f , LPMs, recursively constructing lower level retries as necessary. Another global parameter ^ 
i=f determines that the top-level mask is always 2*- 1 • 

So Referring to FIG. 8, given prefixes 10.24.16.0/20 (a), 10.24.17.0/24 (b), 10.24.32.0/20 

(C) 128.0.0.0/4 (E), and 0.0.0.0/0 The top portion of FIG. 8 shows partition of 32-bit 
address space induced by the prefixes. For example, 5 is an extension of a, which partitions A's 
range, [10.24.16.0,10.24.31.255], into subranges [10.24.16.0,10.24.16.255], 
[10.24.17.0,10.24.17.255], and [10.24.18.0, 10.24.31.255], associated with b, and Q. rsp- The 
25 bottom portion of FIG. 8 shows the radix-encoded trie of the present invention. First level has 
18-bit mask, and second has 6-bit mask to quahfy prefixes ^ b, and c fnlly- Masks and table 
indices are in decimal. For example, to search for x 10.24.19.45, we index (x» 14) & 262143 = 
10336 in the top-level retrie, leading to the second level, which we index by (x » 8) & 63 = 19, 
yielding LPM a. 

30 
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Referring again to FIG. 8, the <shift,mask> values are combined into a single value in the 
predecessor table which cuts the number of memory accesses in half An exemplary code 
implementation of the retrie is fiirther provided in FIG. 9a. hi this embodiment, the elements in 
the last retrie table level contain only the next hop index which decreases the retrie table size, as 
5 demonstrated by the code, IpmatchQ, for (X1)+(X2). Referring now to FIG. 9b, an embodiment 
is shown where the number of retrie levels is fixed, e.g. FIG 9b provides exemplary code, 
pmatch2(), for a 2 level retrie. 

In the present invention, clustering may be performed in both software and hardware 
10 implementations which implements the teachings and methods outlined herein. For example, the 

longest prefix matching process using the radix encoded trie may be implemented within either 
^5 software or hardware implementations for IP routers to perform network aware clustering. For 
5 example, longest prefix matching using the radix encoded trie may be used to assist a router in 
' ii determining the next network point to which a packet should be forwarded toward its destination, 
ill The router may create or mamtain a table of the available routes and their conditions and use this 
ill information along with distance and cost algorithms, as well as clustering information to 
IL determine the best route for a given packet. It will be apparent to those skilled in the art that 

many changes and substitutions can be made to the system and method described herein without 
uj departing from the spirit and scope of the invention as defined by the appended claims. 
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We claim: 



1 1 . An on-line method of classifying IP addresses into related clusters within a distributed 

2 information network, the method comprising the steps of: 

3 receiving a plurality of IP addresses; 

4 processing the plurahty of IP addresses according to a radix encoded trie classification 

5 process; and 

6 classifying the plurality of IP addresses into related clusters. 

1 2. The method of claim 1, wherein the plurality of client IP addresses are received from one 

2 or more network routers. 

"1 3. The method of claim 1, wherein the IP addresses are network client IP addresses. 

i i 

]^ 4. The method of claim 1, wherein the distributed information network is the World Wide 

m Web. 

;5 5. A method for on-line grouping of a plurality of Web cHent IP addresses into related client 

IsM clusters, the method comprising the steps of: 

iS^ extracting cHent IP addresses from a collection of IP addresses; 

4 performing longest prefix matching on each client IP address; and 

5 classifying all of the cUent IP addresses that have the same longest matched prefix into a 

6 client cluster based on a radix encoded trie matching process. 

1 6. The method of claim 1, wherein the client IP addresses are extracted in real time fi*om a 

2 network server. 

1 7. The method of claim 1, wherein the distributed information network is the Internet. 
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8. A method for determining the relationships between a plurality of client IP addresses, the 
method comprising: 

processing the plurality of client DP addresses according to a radix encoding trie (retrie); 

and 

grouping all of the client IP addresses which share the common longest prefix matching 
into at least one client IP grouping. 

9. The method of claim 8 , further comprising: 

receiving the plurality of cUent IP addresses from one or more network servers. 

1 0. The method of claim 8, wherein the network servers are at least one of proxy servers, 
cache servers, content distribution servers and mirror servers. 

11. The method of claim 8, wherein the at least one IP address is a client IP address. 

12. The method of claim 8, wherein the at least one IP address is a server IP address, wherein 
the cluster is a server cluster. 

1 3 . The method of claim 8, wherein the retrie includes shift, mask values which are 
combined into a single value in a predecessor table. 

14. The method of claim 8, wherein the elements in a last retrie table level contain only a 
next hop index so as to decrease the retrie table size. 

1 5 . The method of claim 8, wherein the retrie includes a fixed number of retrie levels. 

16. The method of claim 8, wherein the number of retrie levels is fixed at two levels. 
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17. A computer-readable medium containing executable instructions which cause a computer 
to perform the steps of: 

extracting at least one IP address; 

performing longest prefix matching on the at least one IP address; and 
classifying the at least one IP address into a cluster, wherein the longest prefix matching 
is performed according to a radix-encoded trie. 

18. The computer-readable medium of claim 17, wherein the at least one IP address is a 
client IP address. 

19. The computer-readable medium of claim 17, wherein the at least one IP address is a 
server ff address, wherein the cluster is a server cluster. 

20. The computer-readable medium of claim 17, wherein the radix encoded trie is described 

by the equation: 

while(! ((r=r— >tablel(x»r— >shift)&r— >mask] )&1) 
where x is the search key and r is the radix encode trie. 
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ABSTRACT 

A method for clustering together network IP addresses is disclosed. A number of IP 
addresses are received and processed to determine which IP addresses share a longest prefix 
matching. The longest prefix matching process is performed according to radix encoded trie 
which facilitates on-line clustering of the IP addresses. CUent and/or server IP addresses may be 
clustered in accordance with the teachings herein. 
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matching code for the <S,R, H> values generated by the first 
part of the algorithm 

5 * multi-level retrie longest prefix match 
input arguments: 

* S top level shift 

10 * R retrie internal nodes 

* H retrie leaf nodes (next hop index) 

* addr IP address to match 

* return: 



* 0 no match 

* >0 next hop index 
V 

int 

lpmatch(uint8 uint32* R, uintS* H, uint32 addr) 
{ 

uint32 b; 
uint32 x; 



x = R[addr » S] ; 
^ while (b = X » 27) 

...JL { 

^ b - (x 6c ({1«26)-1)) + ( (addr»{S— b) ) & {l«b)-l)); 

'^0 if (X & (1«26) ) 

J break; 

} 

35 X - R[b] ; 

} 

return x; 
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int 

lpmatch2 (uintS S, uint32^ R, uintS^ H, uint32 addr) 
{ 

uint32 x; 
X = R[addr » S] ; 

return H[(x & ((1«26)-1)) + ((addr & ((1«S)-1)) » (S - (x»27)))]; 

} 
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